Welcome to Embedding Models: from Architecture to Implementation.
Built in partnership with Vectara
You may have heard of Embedding Vectors being used in Generative AI applications. These vectors have an amazing ability to capture the meaning of a word or phrase.
In this lesson, you will learn:
Vector embeddings map real-world entities such as a word, sentence, or image into vector representations, or a point in some vector space.
A key characteristic is that points in vector space that are similar to each other have a similar semantic meaning.
Word2Vec was the pioneering work on learning token or word embeddings that maintain semantic meaning.
These word embedding vectors behave like vectors in a vector space, allowing algebraic operations:
Example from Star Wars text:
A sentence embedding model applies the same principle to complete sentences, converting a sentence into a vector that represents its semantic meaning.
A critical component of any good RAG pipeline is the retrieval engine.
How it works:
In this lesson, you will learn:
Word embedding models like Word2Vec and GloVe don't understand context:
Using these models, both instances of "bat" would have the same vector embedding despite different meanings.
In 2017, the paper "Attention is all you need" introduced the transformer architecture to NLP.
The transformer architecture was originally designed for translation tasks and had two components:
Encoder output vectors are the contextualized vectors we're looking for.
BERT is an encoder-only transformer model heavily used in sentence embedding models.
In this lesson, you will learn:
NLP systems deal with tokens, which can be:
Each sentence is represented by a sequence of integer values corresponding to tokens.
BERT has a vocabulary of about 30,000 tokens and an embedding dimension of 768.
After the success of word embeddings, researchers explored creating embedding vectors for sentences.
These approaches failed because they didn't properly capture the semantic meaning of the entire sentence.
Real progress in sentence embeddings came with the introduction of the dual encoder architecture.
These are not the same goal. For example, for the question "What is the tallest mountain in the world?", we want the answer "Mount Everest is the tallest" rather than the same question as the answer.
The dual encoder architecture has two separate encoders (question encoder and answer encoder) and is trained using a contrastive loss.
In this lesson, you will learn:
The idea behind contrastive loss is to ensure that:
In our context:
In PyTorch, we use cross-entropy loss with a trick: set the target argument to be zero, one, two, etc., indicating that the correct answer for each question is the one associated with it (the diagonal).
The final output is a contextualized embedding that can be used for similarity comparisons.
In this lesson, you will learn:
Finding matching chunks by computing similarity between the question embedding and all answer embeddings is computationally heavy.
Instead, we use Approximate Nearest Neighbors (ANN) algorithms:
These algorithms approximate nearest neighbor searches with high accuracy but significantly lower compute time.
For large datasets, implement ANN using a persistent data store on disk.
In this course, you learned about:
A common practical approach is the two-stage retrieval (Retrieve and Rerank):
While embedding models are essential for RAG, other retrieval techniques can complement neural search:
These techniques help ensure that the facts getting to the LLM are the most appropriate for responding to the user query.
Thank you for joining us to learn about sentence embeddings!